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FAST  ON-CHIP  DELAY  ESTIMATION  FOR  CELL-BASED  EMITTER  COUPLED 
LOGIC 
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Abstract 

The  goal  of  this  work  is  to  produce  fast,  but  accurate,  estimates  of  best  and  worst  case 
delay  for  on-chip  emitter  coupled  logic  (ECL)  nets.  The  work  consists  of  two  major 
parts:  1)  macromodelling  of  ECL  logic  gates  acting  as  both  sources  and  loads;  and  2) 
delay  estimation  for  individual  nets  using  the  gate  macromodel  parameters  and  RC  tree 
models  for  metal  interconnect.  Both  of  the  above  functions  (gate  macromodeling  and 
delay  estimation)  have  been  extensively  tested  on  an  industrial  ECL  process  and  cell 
(i.e.,  logic  gate)  library. 

The  success  of  a  macromodelling  approach  relies  on  repetitive  use  of  members  of  a 
library  of  modelled  cells.  A  “fixed”  computational  cost  (several  c.p.u.  hours  per  cell)  is 
paid  to  obtain  simplified  macromodel  parameter  values.  Resultant  timing  estimates  are 
typically  within  5-10%  of  SPICE  and  are  obtained  roughly  three  orders  of  magnitude 
more  quickly  than  SPICE. 
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ABSTRACT 

The  goal  of  this  work  is  to  produce  fast,  hut  accu¬ 
rate.  estimates  of  best  and  worst  case  delay  for  on-chip 
emitter  coupled  logic  (ECL)  nets.  The  work  consists 
of  two  major  parts:  1)  macromodelling  of  ECL  logic 
gales  acting  as  both  sources  and  loads:  and  2)  delay  es¬ 
timation  for  individual  nets  using  the  gate  macromodel 
parameters  and  RC  tree  models  for  metal  interconnect. 
Both  of  the  above  functions  (gate  macromodelling  and 
delay  estimation)  have  been  extensively  tested  on  an  in¬ 
dustrial  ECL  process  and  cell  (i.e.,  logic  gate)  library. 

The  success  of  a  macromodelling  approach  relies  on 
repetitive  use  of  members  of  a  library  of  modelled  cells. 
A  “fixed"’  computational  cost  (several  c.p.u.  hours  per 
cell)  is  paid  to  obtain  simplified  macromodel  parameter 
values.  Resultant  timing  estimates  are  typically  within 
5-lOSc  of  SPICE  jli  and  arc  obtained  roughly  three 
orders  of  magnitude  more  quickly  than  SPICE. 


I.  INTRODUCTION 


Dmetat  —  TaC  —  TAg(  0) 

=  'T'ab(L)  -  TAp(0)t  —  Tbc.  (1) 


So.  “metal  delay’’  has  two  distinct  components:  ex¬ 
tra  delay  through  the  source  gate  caused  by  the  load¬ 
ing  of  the  source  gate  by  metal,  and  propagation  delay 
through  the  metal  itself.  Worst  case  (or  “slow”)  metal 
delay  is  simply  the  definition  in  (1)  evaluated  using  slow 
SPICE  process  parameters  for  the  logic  gates  and  metal 
interconnect,  the  maximum  expected  input  risetime  at 
point  A,  and  a  slow  target  voltage  threshold  at  points 
B  and  C.  The  slow  target  voltage  thresholds  for  rising 
and  falling  transitions  are  chosen  to  be.  respectively: 


i-  e. 

v  T.flow.r t»t  ■— 

V  — 


I  LOW  4-  VhIC.H 


+  v. 


notft  margtn 


^  'LOW  +  'HIGH  ^  _  y 


no\*t  morjtm 


(2) 

(3) 


for  some  Uno,«  mar*.*  >  0.  The  definition  of  best  case 
(or  “fast”)  metal  delay  is  made  in  a  similar  way  with 
fast  versions  of  SPICE  parameters,  input  risetime,  and 
output  voltage  threshold. 


Definition  of  terms 

The  definition  of  “metal  delay'”  can  best  be  illus¬ 
trated  by  a  simplified  interconnect  net  with  no  branch¬ 
ing  and  only  one  load  gate  [Fig.  lj.  Let  Txe(0)  rep¬ 
resent  delay  through  the  unloaded  source  gate.  Let 
Txb(L)  represent  delay  through  the  source  gate  loaded 
by  an  interconnect  net  of  length  L,  as  shown  in  Fig.  1. 
For  all  gates  in  our  cell  library,  (0)  is  known.  What 


SOURCE  '  ^  '  LOAD 


Figure  1:  Simplified  interconnect  net. 
our  algorithm  estimates  is  “metal  delay”  defined  by: 


Gate  delay  vs.  interconnect  delay 

Previous  work  on  waveform  bounding  and  estima¬ 
tion  for  RC  tree  networks  [2,3,4],  with  application  to 
MOS  circuits,  has  focused  on  the  propagation  com¬ 
ponent  (Tsc)  of  interconnect  delay.  Relatively  simple 
models  are  used  for  the  logic  gates  themselves.  More 
recently,  detailed  macromodels  for  MOS  logic  gates  [5] 
have  been  developed  and  used  in  conjunction  with  the 
RC  tree  delay  estimation  results.  Good  correlation  w  ith 
SPICE  is  obtained  by  fitting  macromodel  parameters  to 
selected  SPICE  experiments.  We  develop  macromodels 
specifically  for  ECL  gates  at  a  level  of  detail  similar  to 
[5].  This  fills  a  definite  need  since  most  reported  work  in 
this  area  has  been  for  MOS  circuits,  even  though  bipo¬ 
lar  digital  circuits  are  presently  in  wide  use  for  high- 
performance  applications.  Recent  work  that  has  been 
reported  for  bipolar  circuits  [6,7 .81  is  concerned  mainly 
with  logic  simulation,  and  the  timing  models  used  are 
relatively  simple. 


r 
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To  emphasize  the  importance  of  accurately  mod¬ 
elling  the  source  gate  (as  opposed  to  just  the  inter¬ 
connect),  in  Fig.  2  we  plot  separately  the  two  com¬ 
ponents  of  (rising  transition,  worst  case)  metal  delay 
versus  total  load  net  capacitance  for  a  uniformly  dis¬ 
tributed  metal  line  in  the  simplified  topology  of  Fig.  1. 
The  gate  used  as  both  the  source  and  the  load  is  a  stan- 
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Figure  2:  Two  components  of  Dmtlat  vs.  total  load  net 
capacitance. 

dard  2-input  OR/NOR.  We  denote  the  total  load  net 
capacitance  by  “Chet"  and  the  maximum  expected 
value  of  Chet  by  “Cmax*  For  low  values  of  Chet< 
extra  source  gate  delay  dominates  propagation  delay: 
the  two  become  equal  only  when  Chet/Cmax  55  0.51. 
Furthermore,  the  statistical  distribution  of  Chet  *s  not 
uniform  across  [0,CMa.y1i  but  rather  is  skewed  towards 
lower  capacitance  vaiues.  In  fact,  for  our  designs,  90% 
of  the  load  nets  have  Chet/Cmax  values  under  0.25, 
where  propagation  delay  is  only  42%  of  extra  source 
gate  delay.  In  addition,  for  a  falling  transition,  extra 
source  gate  delay  is  even  more  important  than  shown  in 
Fig.  2,  typically  exceeding  propagation  delay  through¬ 
out  the  entire  range  0  <  Chet  <  CMax-  Therefore, 
extra  source  gate  delay  is  typically  the  dominant  com¬ 
ponent  of  metal  delay. 


II.  LOGIC  GATE  MACROMODELLING 


Load  modelling 


Modelling  of  ECL  gates  as  loads  is  very  simple.  As 
in  MOS.  a  single  linear  lumped  capacitor  is  an  accept¬ 
able  load  model.  Modelling  of  leakage  current  in  ECL 
loads  does  not  appear  to  be  necessary.  For  our  process, 
the  worst  case  (maximum  expected  metal  resistance, 
maximum  expected  fanout)  voltage  drop  through  metal 
due  to  steady  state  leakage  current  is  less  than  3%  of 
the  rail-to-rail  voltage  swing.  In  situations  where  leak¬ 
age  current  might  be  modelled  (e.g.,  clock  nets  with 


very  high  fanout),  this  could  be  done  by  adding  a  linear 
resistor  and  a  d.c.  current  source  in  parallel  with  the  ca¬ 
pacitor  [9,10] .  The  capacitance  value  is  extracted  from 
SPICE  simulations  of  transient  voltage  at  the  load  and 
current  into  the  load.  Since  on-chip  metal  is  modelled 
as  a  linear  RC  tree,  and  load  gates  as  simple  linear 
capacitors,  this  means  an  entire  load  net  (metal  and 
loads)  is  modelled  as  a  linear  RC  tree. 

Source  modelling 

Modelling  of  ECL  gates  as  sources  is  more  difficult. 
Because  of  an  emitter-follower  output  stage,  source  gate 
behavior  exhibits  a  fundamental  asymmetry  between 
rising  and  falling  transitions.  For  a  falling  transition, 
there  is  a  limitation  on  transient  sinking  current  as 
Chet  increases.  We  use  two  different  source  model 
types:  the  first  one  for  falling  transitions  when  Chet  > 
Cthresh  (to  model  the  transient  sinking  current  limi¬ 
tation),  and  the  second  one  for  falling  transitions  when 
Chet  <  Cthresh  and  for  all  rising  transitions.  The 
“threshold”  capacitance  ( Cthresh )  is  determined  from 
SPICE  simulations  of  transient  source  gate  output  cur¬ 
rent  (/„„,)  during  falling  transitions.  Cthresh  is  de¬ 
fined  to  be  the  value  of  Chet  where  the  sensitivity  of 
|  /Ml  jmal  to  a  perturbation  in  Chet  drops  below  a  pre¬ 
determined  value. 

To  model  the  sinking  current  limitation,  the  first 
source  gate  model  is  simply  a  delayed  current  source 
pulse.  The  model  delay  before  the  onset  of  the  cur¬ 
rent  pulse  and  the  magnitude  of  the  pulse  (Isat)  are 
extracted  from  the  same  SPICE  simulations  used  to  de¬ 
termine  Cthresh-  The  duration  of  the  pulse  is  exactly 
long  enough  to  sink  the  correct  amount  of  charge: 

,  ,  , .  &  Chet  {Vhigh  ~  Vlow) 

pulse  duration  =  - - - - - -.  (4) 

Isat 

The  second  source  gate  model  is  based  on  the  source 
gate's  d.c.  drive  curves,  which  show  the  static  output 
voltage  vs.  input  voltage  relationship  for  different  val¬ 
ues  of  output  load  current.  To  describe  a  family  of 
three-segment  piecewise-linear  approximations  to  the 
d.c.  drive  curves,  four  “d.c.  parameters”  are  obtained 
by  curve  fitting  to  d.c.  SPICE  output  (see  also  ]3.5|). 
These  d.c.  parameters  alone  are  sufficient  to  model  the 
source  gate's  response  to  slow  inputs,  when  the  gate 
behaves  quasi-statically.  However,  an  ECL  input  is 
usually  too  fast  for  the  source  gate  to  respond  quasi- 
statically.  The  source  gate  responds  somewhat  more 
slowly  than  the  quasi-static  model  alone  would  predict. 
So.  four  additional  “dynamic  parameters”  are  extracted 
from  SPICE  data  of  transient  source  gate  responses  in 
order  to  model,  as  a  function  of  Chet .  the  departure 
from  a  purely  quasi-static  response. 

Each  of  the  two  source  gate  models  is  used  in  con¬ 
junction  with  an  approximation  to  the  driving-point 


admittance  of  the  load  net  given  by  a  single  lumped 
RC  segment  (Rnet ,  Chet)-  Based  on  values  for  Rnet , 
Chet*  the  source  gate  macromodel  parameters,  and  the 
input  risetime,  a  closed-form  analytical  expression  for 
the  model  waveform  t'e(t)  is  obtained.  The  detailed 
model  expressions  are  omitted  here  for  brevity,  but  can 


Nlacromodel  parameter  summary 

A  total  of  2(lev)  load  gate  macromodel  parameter 
values  (capacitance)  are  extracted  for  each  cell,  where 
lev  denotes  the  number  of  input  levels  of  the  cell  being 
modelled.  (Note:  the  term  “input  lever  refers  to  a 
subset  of  the  individual  input  signals  that  affect  the 
current-steering  logic  through  the  same  number  of  base- 
emitter  junction  voltage  drops.)  Capacitance  values 
are  obtained  for  each  combination  of  slow  fast  SPICE 
parameters  and  different  input  level. 

A  total  of  76  source  gate  macromodel  parameter  val¬ 
ues  are  extracted  for  each  cell:  12  for  the  first  source 
model  (i.e.,  falling  transition  and  Chet  >  Cthresh). 
and  64  for  the  second  source  model.  For  the  first  model: 
3  parameters  ( CTHresh •  Isat-  and  current  pulse  delay) 
for  each  combination  of  slow/fast  SPICE  parameters 
and  “true/complement”  output  side  of  the  cell.  For 
the  second  model:  8  parameters  (4  “d.c.”  and  4  “dy¬ 
namic”)  for  each  combination  of  slow /fast  SPICE  pa¬ 
rameters,  “true/complement”  output  side  of  the  cell, 
and  rising/falling  output  transition. 

III.  REDUCED-ORDER  INTERCONNECT 
MODELS 

Driving-point  admittance  approximation 

As  mentioned  in  the  previous  section,  to  enable  the 
computation  of  an  analytical  expression  for  the  model 
waveform  ua(t),  the  driving-point  admittance  of  the 
load  net  is  approximated  by  a  single  lumped  RC  seg¬ 
ment  (RN et -C n et) ■  The  values  of  Rnet  and  Cnet  are 
chosen  to  match  the  first  two  terms  of  the  Taylor  series 
expansion  around  s  =  0  of  the  driving-point  admittance 
of  the  given  load  net, 

OO 

yiOAD  NEt{s)  =  l/nS"-  (5) 

n=  1 

where  the  series  representation  is  valid  only  within  some 
circle  of  convergence  I  s  l<  rtom,.  Our  approximate 
driving-point  admittance  is: 


r(s)  = 


1  *  sR 


NET^NET 


Y  (-1)"_,*,»Et"‘,<WV\  (6) 


where  the  second  equality  is  valid  only  within  the  circle 
of  convergence  |s|<  1  /RnetCnet-  So  to  match  both 
the  s  and  sJ  terms,  we  choose: 

Chet  =  SO  (") 

Rnet  =  —  l/i  /  3/T  •  (8) 

The  approximation  is  computed  quickly  using  an  algo¬ 
rithm  lllj  which  allows  the  computation  of  y,  and  y2 
(and,  hence,  of  Rnet  and  Chet)  to  proceed  sequen¬ 
tially  upstream  from  the  leaf  nodes  of  the  load  net  un¬ 
til  the  source  gate  output  is  finally  reached.  The  low- 
frequency  nature  of  this  approximation,  implicit  in  the 
use  of  a  Taylor  series  expansion  around  s  =  0,  turns 
out  to  be  entirely  justified.  For  our  process,  most  of 
the  frequency  content  of  a  typical  source  gate  output 
waveform  lies  well  inside  the  circle  of  convergence  for 
both  the  true  and  approximate  driving-point  admit¬ 
tance  11.. 


Voltage  transfer  function  approximation 

We  propagate  the  model  source  gate  output  volt¬ 
age  waveform  (rB(())  downstream  to  the  load(s)  of  in¬ 
terest  by  convolving  with  an  approximate  unit  voltage 
impulse  response  first  developed  by  Horowitz  [3 , .  We 
use  an  approximate  impulse  response  because  obtaining 
the  precise  impulse  response  of  a  general  RC  tree  is  too 
computationally  expensive.  In  addition,  closed-form 
analytical  expressions  are  available  for  the  approximate 
impulse  response.  This  allows,  via  convolution  with  the 
model  source  gate  output  voltage  waveform,  computa¬ 
tion  of  closed-form  analytical  expressions  for  the  model 
voltage  waveform(s)  at  the  load(s)  (vc(t)).  The  model 
voltage  waveform  at  each  load  is  then  numerically  in¬ 
verted,  at  the  appropriate  voltage  threshold,  in  order 
to  compute  the  metal  delay  to  that  load. 

Let  h(t)  and  /iOppre.i(0  denote,  respectively,  the  true 
and  approximate  unit  voltage  impulse  response  at  a 
given  load.  Let  H(s)  and  Hppprot(s)  denote,  respec¬ 
tively  at  the  same  load,  the  Laplace  transform  of  the 
true  and  approximate  impulse  response.  The  approxi¬ 
mate  transfer  function  has  two  poles  and  one  zero: 

Happro  r(-s)  =  ~  w.  ,■  (9) 

(1  ~  sr,)(l  ^  ST; j 

The  time  constants  r,,  r2,  and  r2  are  determined  by  the 
following  three  constraints: 


z(t)dt  =  [  tk(t)dt  (10) 
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IV.  RESULTS 
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Figure  3:  Rising  transition  comparison. 
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Figure  4:  Falling  transition  comparison. 

In  Figs.  3  and  4,  we  show  comparisons  of  SPICE  vs. 
our  algorithm.  In  Fig.  3,  we  use  the  same  SPICE  data 
shown  in  Fig.  2.  In  Fig.  4,  we  use  the  same  gate  (2- 
input  OR/NOR)  and  the  same  net  topology  (uniform 
unbranched  line  with  a  single  load),  but  we  examine  a 
jailing  transition.  Two  points  to  note  about  Fig.  4  are: 

1.  the  boundary  between  the  two  different  source 
model  types  is  Cnet  =  Cthresh  -  0.43Cmaa 
for  this  particular  gate:  and 

2.  the  two  Tbc  curves  are  nearly  indistinguishable 
on  the  time  scale  of  the  plot. 

Assuming  that  gate  macromodel  parameters  have 
been  obtained  in  advance,  the  computation  speed-up 
relative  to  SPICE  is  approximately  three  orders  of  mag¬ 
nitude.  Similar  accuracy  and  speed-up  results  are  ob¬ 
tained  using  a  wide  variety  of  different  logic  cells  and 
non-uniform  branched  net  topologies  (ill. 
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